AITopics | perturbed leader

Collaborating Authors

perturbed leader

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Neural Information Processing SystemsDec-24-2025, 22:32:02 GMT

We consider the problem of online learning and its application to solving minimax games. For the online learning problem, Follow the Perturbed Leader (FTPL) is a widely studied algorithm which enjoys the optimal $O(T^{1/2})$ \emph{worst case} regret guarantee for both convex and nonconvex losses. In this work, we show that when the sequence of loss functions is \emph{predictable}, a simple modification of FTPL which incorporates optimism can achieve better regret guarantees, while retaining the optimal worst-case regret guarantee for unpredictable sequences. A key challenge in obtaining these tighter regret bounds is the stochasticity and optimism in the algorithm, which requires different analysis techniques than those commonly used in the analysis of FTPL. The key ingredient we utilize in our analysis is the dual view of perturbation as regularization.

optimism and fast parallel algorithm, optimization oracle, perturbed leader, (8 more...)

Neural Information Processing Systems

Industry: Education (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Review for NeurIPS paper: Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Neural Information Processing SystemsFeb-8-2025, 16:22:00 GMT

Summary and Contributions: - It is known in the literature that optimistic variants of FTRL algorithm can yield better bounds when the sequence of loss functions are predictable. Such results are relatively rare for FTPL. This paper proposes the optimistic variant of the FTPL algorithm, which in the worst case known optimal bounds, but has the potential to achieve better regret for predictable sequence of loss functions. Specifically, the bounds depend on the g_t - abla_t _* where g_t is the estimate of the gradient for the next loss function and abla_t is the observed gradient. They instantiate this generic result for the worst case analysis via treating the future estimate g_t 0 and achieve the optimal O(T {\frac{1}{2}}) regret.

optimism and fast parallel algorithm, optimistic variant, smooth minimax game, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.44)
Information Technology > Architecture > Distributed Systems (0.40)

Add feedback

Review for NeurIPS paper: Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Neural Information Processing SystemsFeb-8-2025, 16:21:52 GMT

The paper provides a follow the perturbed leader algorithm and analysis that can obtain better regret bounds when loss/gradient sequence is predictable. The proofs relies on using the equivalent regularization view of FTPL. The authors also provide an application of this result to providing a parallelizable algorithms for solving smooth convex concave saddlepoint games Most of the reviewers found the result interesting. Please address the concerns of the reviewer. Personally, I find the predictable sequences result interesting.

optimism and fast parallel algorithm, perturbed leader, smooth minimax game, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.48)
Information Technology > Architecture > Distributed Systems (0.40)

Add feedback

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Neural Information Processing SystemsJan-16-2025, 22:58:20 GMT

We consider the problem of online learning and its application to solving minimax games. For the online learning problem, Follow the Perturbed Leader (FTPL) is a widely studied algorithm which enjoys the optimal O(T {1/2}) \emph{worst case} regret guarantee for both convex and nonconvex losses. In this work, we show that when the sequence of loss functions is \emph{predictable}, a simple modification of FTPL which incorporates optimism can achieve better regret guarantees, while retaining the optimal worst-case regret guarantee for unpredictable sequences. A key challenge in obtaining these tighter regret bounds is the stochasticity and optimism in the algorithm, which requires different analysis techniques than those commonly used in the analysis of FTPL. The key ingredient we utilize in our analysis is the dual view of perturbation as regularization.

optimism and fast parallel algorithm, optimization oracle, perturbed leader, (8 more...)

Neural Information Processing Systems

Industry: Education (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.79)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

Replicable Online Learning

Ahmadi, Saba, Bhandari, Siddharth, Blum, Avrim

arXiv.org Artificial IntelligenceNov-20-2024

We investigate the concept of algorithmic replicability introduced by Impagliazzo et al. 2022, Ghazi et al. 2021, Ahn et al. 2024 in an online setting. In our model, the input sequence received by the online learner is generated from time-varying distributions chosen by an adversary (obliviously). Our objective is to design low-regret online algorithms that, with high probability, produce the exact same sequence of actions when run on two independently sampled input sequences generated as described above. We refer to such algorithms as adversarially replicable. Previous works (such as Esfandiari et al. 2022) explored replicability in the online setting under inputs generated independently from a fixed distribution; we term this notion as iid-replicability. Our model generalizes to capture both adversarial and iid input sequences, as well as their mixtures, which can be modeled by setting certain distributions as point-masses. We demonstrate adversarially replicable online learning algorithms for online linear optimization and the experts problem that achieve sub-linear regret. Additionally, we propose a general framework for converting an online learner into an adversarially replicable one within our setting, bounding the new regret in terms of the original algorithm's regret. We also present a nearly optimal (in terms of regret) iid-replicable online algorithm for the experts problem, highlighting the distinction between the iid and adversarial notions of replicability. Finally, we establish lower bounds on the regret (in terms of the replicability parameter and time) that any replicable online algorithm must incur.

alg int, algorithm, probability, (14 more...)

arXiv.org Artificial Intelligence

2411.1373

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)

Add feedback

Solving Robust MDPs through No-Regret Dynamics

Guha, Etash Kumar, Lee, Jason D.

arXiv.org Artificial IntelligenceMay-30-2023

Reinforcement Learning is a powerful framework for training agents to navigate different situations, but it is susceptible to changes in environmental dynamics. However, solving Markov Decision Processes that are robust to changes is difficult due to nonconvexity and size of action or state spaces. While most works have analyzed this problem by taking different assumptions on the problem, a general and efficient theoretical analysis is still missing. However, we generate a simple framework for improving robustness by solving a minimax iterative optimization problem where a policy player and an environmental dynamics player are playing against each other. Leveraging recent results in online nonconvex learning and techniques from improving policy gradient methods, we yield an algorithm that maximizes the robustness of the Value Function on the order of $\mathcal{O}\left(\frac{1}{T^{\frac{1}{2}}}\right)$ where $T$ is the number of iterations of the algorithm.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2305.19035

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(2 more...)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Introduction to Multi-Armed Bandits

Slivkins, Aleksandrs

arXiv.org Artificial IntelligenceApr-29-2019

Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the years, covered in several books and surveys. This book provides a more introductory, textbook-like treatment of the subject. Each chapter tackles a particular line of work, providing a self-contained, teachable technical introduction and a review of the more advanced results. The chapters are as follows: Stochastic bandits; Lower bounds; Bayesian Bandits and Thompson Sampling; Lipschitz Bandits; Full Feedback and Adversarial Costs; Adversarial Bandits; Linear Costs and Semi-bandits; Contextual Bandits; Bandits and Zero-Sum Games; Bandits with Knapsacks; Incentivized Exploration and Connections to Mechanism Design.

consider adversarial bandit, data mining, machine learning, (23 more...)

arXiv.org Artificial Intelligence

1904.07272

Country: North America > United States (0.45)

Genre:

Summary/Review (1.00)
Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.92)
Information Technology > Services (0.67)
Marketing (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
(4 more...)

Add feedback

Online Non-Convex Learning: Following the Perturbed Leader is Optimal

Suggala, Arun Sai, Netrapalli, Praneeth

arXiv.org Machine LearningMar-19-2019

We study the problem of online learning with non-convex losses, where the learner has access to an offline optimization oracle. We show that the classical Follow the Perturbed Leader (FTPL) algorithm achieves optimal regret rate of $O(T^{-1/2})$ in this setting. This improves upon the previous best-known regret rate of $O(T^{-1/3})$ for FTPL. We further show that an optimistic variant of FTPL achieves better regret bounds when the sequence of losses encountered by the learner is `predictable'.

artificial intelligence, inequality follow, machine learning, (16 more...)

arXiv.org Machine Learning

1903.0811

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback